Untitled Document

Towards a self-review question-generator for Wikipedia articles

Andrew K F Lui and Ng Sin-chun
Open University of Hong Kong
Hong Kong SAR, China

Li Siu Cheung
Hong Kong Baptist University
Hong Kong SAR, China

Wikipedia has become a major resource for teaching and learning. Although there is still some scepticism about its reliability and scholarly standard, many schools and universities have incorporated Wikipedia articles into their syllabuses. With an increased effort to manage the reliability of Wikipedia articles and proper reviewing by teachers, there is little reason to reject this free, timely and abundant source of learning content. When a student is asked to read an article, it is a good practice to provide self-review questions to guide the focus of reading. Such self-review questions assess knowledge of the major concepts and their relations in the text. The project outlined in this paper aims to develop a self-review question-generator for Wikipedia articles. The system exploits the style and structural uniformity of Wikipedia articles and applies natural language processing techniques to electronic text to identify the key concepts and create relevant questions. Part of the system is a wrapper interface that supports the viewing of an article and the display of dynamically generated exercises at the same time.

Questioning and answering is often central to the learning process, with good questions motivating students to find answers. As designing questions for students is an intellectually challenging task, it is clearly very difficult to automate this process. A good question in a teaching-learning context should be short, clear, at the right level and phrased in an appropriate manner. There are many types of questions, most often categorized according to their purpose or intention. Of these, the closed type, with specific answers, is the easiest to be fully automated: with a known answer and its context, one can design an algorithm to formulate a simple question. Several systems have been proposed for the generation of cloze tests for the assessment of context and vocabulary abilities. These systems operate by identifying a vocabulary item in a sentence and removing the word for students to guess it from the sentence context. The key is to find vocabulary at the appropriate level of challenge for the students.

This paper describes Wikaquest, a system that generates self-review questions for Wikipedia articles. Many of these questions are of the closed type, asking for specific conceptual knowledge explicitly quoted in the text. Wikaquest operates in the following manner:

downloading a Wikipedia article and filtering out its non-text content;
applying part-of-speech tagging to all the words;
applying a noun phrase chunker to find all the noun phrases and considering these as possible concepts for questions;
using a thematic relation extractor to build a concept table of the concepts, and their relations and thematic roles;
applying stemming, verb normalization and other typical text pre-processing tools before collecting statistics on the one-word and two-word terms;
calculating the importance of these terms based on statistical analysis techniques such as co-occurrence and term-frequency inverse-document-frequency; and
considering the important terms as key concepts and the answers for a to-be-generated question, choosing a key concept, looking up the related concepts and their thematic relations from the concept table, and omitting the chosen key concept to construct questions.

Basically, the operation involves syntactic analysis to identify the concepts and their relations, and statistical analysis to estimate the importance of the concepts for selecting the appropriate ones for question construction.

The paper describes the operation of Wikaquest in more detail. It also analyses the problem of automatic question-generation, reviews relevant existing systems and outlines the prototype implementation of Wikaquest and its evaluation.